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METHOD AND SYSTEM FOR IMPLEMENTING A FAST RECOVERY 
PROCESS IN A LOCAL AREA NETWORK 

FIELD OF THE INVENTION 

The present invention relates to local area 
5 networks (LAN) . In particular, the present invention 
relates to a novel and improved method and system for 
implementing a fast recovery process in a local area 
network. 

10 BACKGROUND OF THE INVENTION 

The local area network (LAN) is a group of 
computers and associated devices that share a common 
communications line and typically share the resources 
of a single processor or server within a small geo- 

15 graphic area (for example, within an office building, 
within certain parts of IP backbone networks or within 
a network element, such as a telephone exchange or 
network control element) . In this context, the local 
area network can also mean an architecture that uses a 

20 so-called "loosely coupled multiprocessor" architec- 
ture and in which the messages between the processors 
are sent via Ethernet. This kind of architecture can 
be implemented, for example, in the IP trunks, MSC 
Servers (Mobile Switching Center, MSC) or network ele- 

25 ments such as Connection Prosessing Server (CPS) or 
Home Subscriber Server, which are used in "all IP' ar- 
chitectures of the third generation mobile networks. 

Usually, the server has applications and data 
storage that are shared in common by multiple computer 

30 users or central processor units (CPU's). The local 
area network may serve as few as two or three users or 
clients (for example, in a home network) or as many as 
thousands of users. The information is transmitted be- 
tween two clients or hosts using the paths from the 

35 first client to the second client. These paths are 
formed using the links between the two network ele- 
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ments. Typically the paths are formed beforehand. In 
redundant networks the first link of the host or cli- 
ent to the first network node is duplicated, thus al- 
lowing a recovery to the other path in a fault situa- 
5 tion of the first path. 

A router is a device or, in some cases, soft- 
ware in a computer that determines the next network 
point to which a packet should be forwarded toward its 
destination. The router is connected to at least two 
10 networks and it decides which way to send each infor- 
mation packet based on its current understanding of 
the state of the networks it is connected to. The 
router is often included as a part of a network 
switch. 

15 The switch is a network device that selects a 

path or circuit for sending a unit of data to its next 
destination. The switch may also include the function 
of the router, a device or program that can determine 
the route and specifically what adjacent network point 

20 the data should be sent to. In general, a switch is a 
simpler and faster mechanism than a router, which re- 
quires knowledge about the network and how to deter- 
mine the route . 

Relative to the layered Open Systems Inter- 

25 connection (OSI) communication model, a switch is usu- 
ally associated with Layer 2, the Data Link Layer. 
However, some newer switches also perform the routing 
functions of layer 3, the Network Layer. Layer 3 
switches are also sometimes called IP switches. 

3 0 On computer and telecommunication devices, a 

port is generally a specific place for being physi- 
cally connected to some other device, usually with a 
socket and plug of some kind. A link is a physical 
and, in some usage, a logical connection between two 

35 points. Both ends of the link are usually connected to 
the port . 
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In this context, the term "host" means any 
computer that has a complete two-way access to other 
computers in the network. A host has a specific "local 
or host number" that, together with the network num- 
5 ber, forms its unique address. A "host" is a node in a 
network. 

To maintain the operation of the network one 
has to take care of the fact that all the substantial 
elements are operational. The management of faults as 

10 a part of the network management improves the reli- 
ability of the network, thereby providing the main- 
tainer of the network and the network itself with the 
tools for promptly detecting the faults and correcting 
them. The responsibility of the management of faults 

15 is to arrange things so that problems and interrup- 
tions would be visible to the users as little as pos- 
sible . 

The devices in the network may send a notifi- 
cation of a critical situation every time there is a 

20 fault situation occurring (logging) . Examples of 
critical situations are, e.g. the rebooting of a de- 
vice or a response that was never received from a de- 
vice. In most of the cases, the management of faults 
based on merely this kind of information does not give 

25 a sufficient picture of the state of the network. For 
example, when some device is damaged, it is not always 
able to send a notification thereof. 

The devices of the network may be regularly 
asked about their status (polling) . Enquiries such as 

30 this enable one to detect the faults quite promptly. 
However, they take the capacity of the network from 
the actual payload. One has to balance between the de- 
tection accuracy and network capacity to be used, i.e. 
the greater the detection accuracy one wishes to have, 

3 5 the bigger part of the transfer capacity of the net- 
work is used. Other matters that have an influence on 
the selection of the polling interval are the number 
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of the devices to be monitored and the capacity of the 
links to be used. 

When the failure is detected, one has to ac- 
curately locate the fault and isolate the rest of the 
5 network from the disturbance caused by the fault. The 
network has to be configured or changed in such a way 
that the effects of the elimination of a component on 
the operation of the network are minimised. Finally, 
the network is reset by correcting or changing the 

10 faulty components. 

However, there are situations and network so- 
lutions in which the above-mentioned methods for the 
management of faults and most of all for the fault de- 
tection are not applicable because the fault has to be 

15 detected without delay. For example, in the internal 
network structure of a network element or IP network, 
the failure of a link combining two plug- in units may 
cause problems in an ongoing call or real time data 
connection, in which case the fault has to be detected 

2 0 very fast, in order that the call or connection would 

not be interrupted and that the users would not detect 
the fault. 

The standard method in a redundant local area 
network is to use the Spanning Tree Protocol (STP) or 

25 some vendor-specific, proprietary solution. The span- 
ning tree protocol and algorithm were developed by a 
committee of the IEEE (Institute of Electrical and 
Electronics Engineers) . Currently, the IEEE is at- 
tempting to institute enhancements to the spanning 

30 tree algorithm that will reduce network recovery time. 
The goal is to go from 3 0 to 60 seconds after a fail- 
ure or change in link status to less than 10 seconds. 
However, due to the long recovery time needed, the STP 
is not suitable for environments requiring fast (a 

3 5 maximum of few seconds) recovery. 

An alternative solution is that each IP host 
monitors that it has a functioning link to some criti- 
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cal part of the LAN (typically this is the router con- 
necting the host to the external IP network) . A simple 
method to implement the monitoring is to use the ICMP 
ECHO (ping) messages, which are sent to the router and 
5 to which it is supposed to respond. ICMP is a message 
control and error- reporting protocol between a host 
server and a gateway to the Internet . ICMP uses Inter- 
net Protocol datagram, but the messages are processed 
by the IP software and are not directly apparent to 

10 the application user. 

The major problem of the standard STP is its 
possibly slow recovery (recovery may take several tens 
of seconds during which time part or all of the LAN 
won't carry traffic). Vendor- specif ic solutions are 

15 much faster, but they require that all critical equip- 
ment (mainly LAN switches) be purchased from a single 
provider . 

There are also some problems with the ICMP 
ECHO method: It can be used only with links which have 
20 a corresponding IP address. That is, this method can- 
not be used if we have a redundant LAN port which does 
not have an IP address bound to it (for example, the 
port is just idling and can be used in case of a pri- 
mary LAN port failure) . The ICMP ECHO messages create 

2 5 some extra load for the LAN and especially for the 

router (or some other device which the host wants to 
ping) . Thus, it is not possible to monitor the func- 
tionality of the link constantly but only intermit- 
tently, for example, once in five seconds. Some ECHO 

3 0 messages can also be lost due to congestion and thus 

the recovery can be started only after a few unan- 
swered messages. As a result, even though the recovery 
itself can be very fast, the detection of a fault is 
still rather slow, taking from several seconds to ap- 
35 proximately 20 seconds. 

SUMMARY OF THE INVENTION 
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The present invention concerns a method and a 
system for accelerating fault recovery in a redundant, 
tree structured local area network. In this context, 
the tree structure means that there are no closed 
5 loops in the network. The tree is a directed non- 
cyclic network. The invention is used to define some 
of the LAN ports, which are, for example, used to con- 
nect the switch into the IP router, as critical ones. 
Likewise, some other LAN ports, used to connect the IP 

10 hosts to said switch, are defined as dependent of the 
critical links. If a critical LAN port or correspond- 
ing link is found to be non- functional , e.g. no car- 
rier sensed, all LAN ports or corresponding links de- 
pending on it are declared as non-functional. The dec- 

15 laration is done at link level in a way which allows 
the device (s) or ports connected to the other end of 
the link to notice that the link is not in use anymore 
to carry traffic. Thus, the net effect is that the 
knowledge of the fault at the upper level of the tree 

2 0 is propagated very fast down to the hosts, thus ena- 

bling fast recovery. 

The present invention may enable a considera- 
bly fast detection time of a failure taking about a 
second, perhaps even less. Because of this, the recov- 

25 ery time can be reduced significantly. Also the fault 
detection, according to the present invention, does 
not load the LAN, even though the load reduction is 
not likely to be significant. Also, the usability of 
the present invention does not require that there is 

30 an IP address bound to all ports (links) to be moni- 
tored. 

The present invention also overcomes the 
problems of the ICMP ECHO mechanism in the sense that 
the ICMP ECHO mechanism is an end-to-end verification 

3 5 of the path, whereas the present invention can guaran- 

tee that the physical path from the host to the exter- 
nal IP network or vice versa is in use. 
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Also the present invention can be implemented 
in a way which is compatible with the current LAN 
switches. The reason for this is the fact that the in- 
ventive mechanism does not require any protocol be- 
5 tween the LAN switches. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are included 
10 to provide a further understanding of the invention 
and constitute a part of this specification, illus- 
trate embodiments of the invention and together with 
the description help to explain the principles of the 
invention. In the drawings: 
15 Fig 1 is a block diagram illustrating a net- 

work structure according to one embodiment of the pre- 
sent invention, and 

Figs 2a-2b describe a structure of the net- 
work element according to one embodiment of the pres- 
20 ent invention in more detail. 

DETAILED DESCRIPTION OF THE INVENTION 

Reference will now be made in detail to the 
embodiments of the present invention, examples of 

25 which are illustrated in the accompanying drawings. 

In Figure 1 there is described a redundant 
LAN, which has the topology of a tree. The term "re- 
dundant" means that the host connection has been du- 
plicated in order to allow a switch over from the ac- 

3 0 tive link Lli to the standby link Ll 2 in a link or a 
path failure situation. Also in figure 1 there is one 
active connection (traffic flow) described with the 
dash line. This connection is established between the 
Host 1 and the router Rl. It should be noted that the 

35 LAN topology in this example is such that there are at 
least two stages of LAN switches. 
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If one of the 1 st stage LAN switches SW1 , 
SW6 has failed, failure 1, or has been powered down 
for maintenance etc. there is not a big problem be- 
cause the hosts Host 1, Host 9 are connected di- 
5 rectly to the 1 st LAN switches and they can detect 
themselves when a link/LAN port goes from link-up 
state to link-down state. The recovery can be initi- 
ated immediately when the LAN driver software in the 
host notifies of the link-down situation. If one of 

10 the 2 nd stage LAN switches SW7 , SW8 has failed, failure 
2, the situation is the same if the link to the corre- 
sponding router or a link between a 1 st stage and 2 nd 
stage LAN switch has failed, the problem is that be- 
cause the hosts are not directly connected to the 2 nd 

15 stage LAN switch, they do not directly detect the 
failure. This is because the link from the host to the 
1 st stage LAN switch stays in the link-up state. The 
recovery only starts when the hosts find out that the 
router is not responding to the I CMP ECHO messages . 

2 0 But as mentioned above, this is not the best and fast- 
est way to start the recovery process . 

In the following there is described the idea 
of the present invention. In the failed 2 nd stage LAN 
switch SW7 it has been defined that the link LSW7 to 

25 router Rl, called up-link, is a critical link and the 
so-called down- links LSW1, LSW3 , LSW5 to the 1 st stage 
LAN switches SW1 , SW3 , SW5, are dependent of the 
critical up-link LSW7 . Thus, if the up-link LSW7 
fails, all down-links LSW1, LSW3 , LSW5 are set in the 

30 link-down state. Likewise, in the 1 st stage LAN 
switches SW1 , SW3 , SW5, the links LSW1, LSW3 , LSW5 to 
the 2 nd stage LAN switch SW7 are defined to be as 
critical and links Ll x , L2i, L3i L4 1# L5i, L6 X , L7 X , L8 X , 
L9i to the hosts 1, ,2, 9, down- links, are defined 

35 to be dependent of the up-links LSW1 , LSW3 , LSW5 . The 
net result is that if the 2 nd stage LAN switch or its 
link to the router fails, failure 2, then the link- 
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down state is propagated down to hosts Host 1, Host 
9. The same will happen if the link between a 1 st stage 
and 2nd stage LAN switch fails. Thus, the hosts become 
very quickly aware of a failure in the LAN and can 
5 start recovery immediately. 

One example of said recovery is that the host 
transfers to a predetermined default mode. This is the 
case if also the redundant up- link, e.g. link Ll 2 for 
Host 1, is in a link down state. For instance the Home 

10 Subscriber Server, an example of possible Host 1, is 
solving a profile of a certain user and it needs to be 
connected to the other network element (not shown) be- 
hind the routers Rl, R2 . If both links Ll x and Ll 2 are 
in link down state, the recovery in this example is 

15 that Host 1 uses a predetermined default profile for 
said user. The only important matter is that the host 
is notified as soon as possible of the link down 
situation of said active and redundant links. 

It must be noted that the necessary changes 

2 0 will be implemented in the LAN switches, even though 
co-operation with the host software is needed. The 
host moves all LAN traffic into the redundant LAN port 
if the currently used LAN port is changed into a link- 
down state. It must also be noted that there can be 

25 more than one critical link per LAN switch and that a 
link can depend on zero, one or more critical links. 
If a link depends on more than one critical links, the 
link will be put into a link-down state if any of the 
critical links is in a link-down state. 

30 In the case of a failed link, the LAN switch 

or router is repaired and put into operation, and all 
ports connected to it are put into a link-up state un- 
less otherwise specified by some management operation. 
As a result, all links dependent of it are also put 

35 into a link-up state unless overridden by management 
operations. This process is very much the same as in a 
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failure situation where the hosts are notified of the 
failure situation . 

The above described inventive mechanism can 
also be used to notify the hosts or the LAN switches, 
5 if there is something wrong with the transmit- 
direction of the connection. The idea is that normally 
a device cannot know whether or not it is transmitting 
properly or whether or not the receiving device is re- 
ceiving properly. However, it is possible to think of 

10 a link to be dependent of itself and change the state 
of the link into a link-down state if it is noticed 
that the device on the other end of the link is not 
receiving or sending properly, i.e. there are exces- 
sive CRC (cyclic redundancy check) errors, runt frames 

15 etc. 

In figure 2a there is described a coarse ex- 
ample of the LAN Switch structure according to one em- 
bodiment of the present invention. In figure 2b there 
is described a coarse example of the host or CPU unit 
2 0 structure according to one embodiment of the present 
invention. 

In both examples there is an Ethernet con- 
troller or Ethernet physical layer transceiver EC con- 
nected to the network element itself. The Ethernet 

2 5 controller EC is further divided at least in two com- 

ponents or modules which, of course, can be in the 
same circuit. These modules are the Media Access Con- 
troller MAC and the physical layer device PHY. The me- 
dia access layer communicates directly with the net- 

3 0 work adapter card and is responsible for delivering 

error- free data between two computers. The physical 
layer device PHY performs the same general function as 
a transceiver in the typical Ethernet system. 

For a typical network connection the data 
35 terminal equipment, LAN switch, host or CPU device 
(computer) contain an Ethernet interface EC which gen- 
erates and sends Ethernet frames that carry data be- 
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tween computers attached to the network. The interface 
or repeater port might also be designed to include the 
PHY electronics internally. In the present invention 
the Ethernet controller EC is designed to monitor the 
5 status of the active link. After the Ethernet control- 
ler has noticed a link-down situation, it "sends" in- 
formation about the situation downwards by setting the 
downward links into a link-down state. When the 
Ethernet controller in the host notices the link-down 

10 situation of the active link it notifies the host 
software, and the recovery can be started. 

In figure 2a there is described the implemen- 
tation of N ports into one LAN- Switch. The Ethernet 
Controller EC comprises n pairs of the media access 

15 controller MAC - physical layer device PHY. Physical 
layer devices are connected to the control logic, 
which typically can be implemented by a microprocessor 
in order to monitor and control the state of the PHY 
devices . 

20 The essential feature of the PHY devices is 

that they contain or provide an information signal 
and/or register that informs of the state of the link 
or port. It is also useful if the information can be 
monitored using software. Also the PHY device can pro- 

25 vide said information by producing an interruption to 
the microprocessor that can interpret this interrup- 
tion as a change of the state of the PHY device. An- 
other essential feature of the PHY device is that it 
can be reset into the state in which it does not give 

30 idle information to the other PHY device. In figure 
2a, the control of the above- mentioned two essential 
features is described using two different signal 
types. "Link Down" indication signals are sent from 
the PHY devices in order to inform the Control logic 

35 of the present situation of the link. Thus the PHY de- 
vices can be set into the state which can be recog- 
nised as a failure situation in the down link of said 



12 



devices. "PHY Reset" signals are used to set the PHY 
devices into the down state so that the other PHY de- 
vice in a down link direction can recognise the fail- 
ure in the up link direction, i.e. these signals dis- 
5 able the PHY devices. 

It is obvious to a person skilled in the art 
that with the advancement of technology, the basic 
idea of the invention may be implemented in various 
ways. The invention and its embodiments are thus not 
10 limited to the examples described above, instead they 
may vary within the scope of the claims. 



CLAIMS 

1 . A method for fast recovery of a host con- 
nection in a redundant tree structured local area net- 
work, characterised in that the method com- 

5 prises the steps of: 

monitoring the state of a critical up-link, 
setting a dependent down- link in a link- down 
state, if said critical up-link is detected to be in a 
link-down state. 
10 monitoring the state of a active up-link in 

the host device, and 

starting a recovery process in a host device 
if said active link is in the link-down state, 

2. The method according to claim 1, char- 
15 acterised in that specifying the up-link of a 

network element being a critical up-link, if the fail- 
ure of said link affects the data flow of a down- link 
of said network element . 

3. The method according to claim 1, c h a r - 
20 acterised in that specifying the link of a net- 
work element being a dependent down-link, if there is a 
critical up-link between said down-link and the next 
network element . 

4. The method according to claim. 1, char- 
25 acterised in that the recovery process comprises 

the steps of : 

notifying the host software of the link fail- 
ure in the active up-link, and 

changing the active data path to the redun- 
3 0 dant up-link. 

5. The method according to claim 1, char- 
acterised in that the recovery process comprises 
the steps of : 

notifying the host software of the link fail- 
3 5 ure in the active up-link, 

checking the status of the redundant up-link, 
and if said up-link is in link down state, 
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transferring said host to the predetermined 
default mode operation. 

6. The method according to claims 4 or 5, 
characterised in that said redundant up- link 

5 is a doubling up-link for said active up-link. 

7. The method according to claim 1, char- 
acterised in that monitoring the state of a 
critical up-link is accomplished by monitoring the 
quality of the data flow on the link. 

10 8. A system for fast recovering of a host 

connection in a redundant tree structured local area 
network, characterised in that the system 
comprises 

a monitoring device (EC) for monitoring the 
15 state of a critical up-link, for setting a dependent 
down-link in a link-down state, if said critical up- 
link is detected to be in a link-down state and for 
starting a recovery process in a host device if said 
active link is in the link-down state. 
20 9. The system according to claim 8, char- 

acterised in that said monitoring device (EC) 
further comprises 

a physical layer device (PHY) for monitoring 
the physical state of said up-link, and 
25 a media access controller (MAC) for changing 

the state of the down-link. 

10. The system according to claim 8, char- 
acterised in that the up-link of a network ele- 
ment (SW1, SW8) is a critical up-link, if the fail- 

30 ure of said link affects the data flow of a down-link 
of said network element . 

11. The system according to claim 8, char- 
acterised in that the link of a network element 
(SW1, SW8) is a dependent down- link, if there is a 

35 critical up-link between said down- link and the next 
network element (SW1, SW8) . 
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12. The system according to claim 8, char- 
acterised in that said monitoring device (EC) is 
an Ethernet controller. 



(57) ABSTRACT 

The present invention concerns a method and a 
system for accelerating fault recovery in a redundant, 
tree structured local area network. The invention is 
used to define some of the LAN ports, which are, for 
example, used to connect the switch (SW) into the IP 
router, as critical ones. Likewise, some other LAN 
ports, used to connect the IP hosts to said switch 
(SW) , are defined as dependent of the critical links. 
If a critical LAN port or corresponding link is found 
to be non- functional , e.g. no carrier sensed, all LAN 
ports or corresponding links depending on it are de- 
clared as non-f unctional . The declaration is done at 
link level in a way which allows the device (s) or 
ports connected to the other end of the link to notice 
that the link is not in use anymore to carry traffic. 
Thus, the net effect is that the knowledge of the 
fault at the upper level of the tree is propagated 
very fast down to the hosts, thus enabling fast recov- 
ery. The present invention may enable a considerably 
fast detection time of a failure taking about a sec- 
ond, perhaps even less. 
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